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Summary. The danger of confusing long-range dependence with non-stationarity 
has been pointed out by many authors. Finding an answer to this difficult question 
is of importance to model time-series showing trend-like behavior, such as river run- 
off in hydrology, historical temperatures in the study of climates changes, or packet 
counts in network traffic engineering. 

The main goal of this paper is to develop a test procedure to detect the presence 
of non-stationarity for a class of processes whose K-th order difference is stationary. 
Contrary to most of the proposed methods, the test procedure has the same distri- 
bution for short-range and long-range dependence covariance stationary processes, 
which means that this test is able to detect the presence of non-stationarity for 
processes showing long-range dependence or which are unit root. 

The proposed test is formulated in the wavelet domain, where a change in the 
generalized spectral density results in a change in the variance of wavelet coefficients 
at one or several scales. Such tests have been already proposed in Whitcher et al. 
(2001), but these authors do not have taken into account the dependence of the 
wavelet coefficients within scales and between scales. Therefore, the asymptotic dis- 
tribution of the test they have proposed was erroneous; as a consequence, the level 
of the test under the null hypothesis of stationarity was wrong. 

In this contribution, we introduce two test procedures, both using an estimator 
of the variance of the scalogram at one or several scales. The asymptotic distribution 
of the test under the null is rigorously justified. The pointwise consistency of the test 
in the presence of a single jump in the general spectral density is also be presented. 

A limited Monte-Carlo experiment is performed to illustrate our findings. 



2 O. Kouamo, E. Moulines, and F. Roueff 

1.1 Introduction 



For time series of short duration, stationarity and short-range dependence 
have usually been regarded to be approximately valid. However, such an as- 
sumption becomes questionable in the large data sets currently investigated in 
geophysics, hydrology or financial econometrics. There has been a long lasting 
controversy to decide whether the deviations to "short memory stationarity" 
should be attributed to long-range dependence or are related to the presence of 
breakpoints in the mean, the variance, the covariance function or other types 
of more sophisticated structural changes. The links between non-stationarity 
and long-range dependence (LRD) have been pointed out by many authors in 
the hydrology literature long ago: Klemes (1974) and Boes and Salas (1978) 
show that non-stationarity in the mean provides a possible explanations of 
the so-called Hurst phenomenon. Potter (1976) and later Rao and Yu (1986) 
suggested that more sophisticated changes may occur, and have proposed a 
method to detect such changes. The possible confusions between long-memory 
and some forms of nonstationarity have been discussed in the applied proba- 
bility literature: Bhattacharya et al. (1983) show that long-range dependence 
may be confused with the presence of a small monotonic trend. This phe- 
nomenon has also been discussed in the econometrics literature. Hidalgo and 
Robinson (1996) proposed a test of presence of structural change in a long 
memory environment. Granger and Hyung (1999) showed that linear processes 
with breaks can mimic the autocovariance structure of a linear fractionally 
integrated long-memory process (a stationary process that encounters occa- 
sional regime switches will have some properties that are similar to those of a 
long- memory process). Similar behaviors are considered in Dicbold and Inoue 
(2001) who provided simple and intuitive econometric models showing that 
long-memory and structural changes are easily confused. Mikosch and Starica 
(2004) asserted that what had been seen by many authors as long memory 
in the volatility of the absolute values or the square of the log-returns might, 
in fact, be explained by abrupt changes in the parameters of an underlying 
GARCH-type models. Bcrkes et al. (2006) proposed a testing procedure for 
distinguishing between a weakly dependent time series with change-points in 
the mean and a long-range dependent time series. Hurvich et al. (2005) have 
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proposed a test procedure for detecting long memory in presence of determin- 
istic trends. 

The procedure described in this paper deals with the problem of detecting 
changes which may occur in the spectral content of a process. We will consider 
a process X which, before and after the change, is not necessary stationary 
but whose difference of at least a given order is stationary, so that polynomial 
trends up to that order can be discarded. Denote by AX the first order 
difference of X, 

[AI]„ d ^I n -Vi, neZ, 

and define, for an integer K > 1 , the K-t\\ order difference recursively as 
follows: A K = A o A K ^ 1 . A process X is said to be K-th order differ- 
ence stationary if A K X is covariance stationary. Let / be a non-negative 
27r-periodic symmetric function such that there exists an integer K satisfying, 
J" |1 — e~ lA | 2X /(A)dA < oo. We say that the process X admits generalized 
spectral density / if A K X is weakly stationary and with spectral density 
function 

MA) = |l-e- iA r.f(A). (Tl) 

This class of process include both short-range dependent and long-range de- 
pendent processes, but also unit-root and fractional unit-root processes. The 
main goal of this paper is to develop a testing procedure for distinguishing 
between a K-th order stationary process and a non-stationary process. 

In this paper, we consider the so-called a posteriori or retrospective method 
(see (Brodsky and Darkhovsky, 2000, Chapter 3)). The proposed test is for- 
mulated in the wavelet domain, where a change in the generalized spectral 
density results in a change in the variance of the wavelet coefficients. Our 
test is based on a CUSUM statistic, which is perhaps the most extensively 
used statistic for detecting and estimating change-points in mean. In our pro- 
cedure, the CUSUM is applied to the partial sums of the squared wavelet 
coefficients at a given scale or on a specific range of scales. This procedure 
extends the test introduced in Inclan and Tiao (1994) to detect changes in 
the variance of an independent sequence of random variables. To describe the 
idea, suppose that, under the null hypothesis, the time series is K-th order 
difference stationary and that, under the alternative, there is one breakpoint 
where the generalized spectral density of the process changes. We consider the 
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scalogram in the range of scale Ji, J\ + 1, . . . , J 2 . Under the null hypothesis, 
there is no change in the variance of the wavelet coefficients at any given scale 
j G { Ji, . . . , J2}. Under the alternative, these variances takes different values 
before and after the change point. The amplitude of the change depends on 
the scale, and the change of the generalized spectral density. We consider the 
(J 2 -Ji+l)-dimensionalW2-CUSUM statistic {Tj u j 2 (t), t <= [0, 1]} defined by 
(1.41), which is a CUSUM-like statistics applied to the square of the wavelet 
coefficients. Using Tj lt j 2 (t) we can construct an estimator Tj lt j 2 of the change 
point (no matter if a change-point exists or not), by minimizing an appropri- 
ate norm of the W2-CUSUM statistics, t Ji _j 2 = Argrnin t£ [ 01 ] ||7j 1) j 2 (f)||*. 
The statistic Tj 1 ^j 2 (t.j 1j j 2 ) converges to a well-know distribution under the 
null hypothesis (see Theorems 1 and 2) but diverges to infinity under the al- 
ternative (Theorems 3 and 4). A similar idea has been proposed by Whitcher 
et al. (2001) but these authors did not take into account the dependence of 
wavelet coefficient, resulting in an erroneous normalization and asymptotic 
distributions. 

The paper is organized as follows. In Section 1.2, we introduce the wavelet 
setting and the relationship between the generalized spectral density and the 
variance of wavelet coefficients at a given scale. In Section 1.3, our main as- 
sumptions arc formulated and the asymptotic distribution of the W2-CUSUM 
statistics is presented first in the single scale (sub-section 1.3.1) and then in 
the multiple scales (sub-section 1.3.2) cases. In Section 1.4, several possible 
test procedures are described to detect the presence of changes at a single scale 
or simultaneously at several scales. In Section 1.6, finite sample performance 
of the test procedure is studied based on Monte-Carlo experiments. 

1.2 The wavelet transform of K-th order difference 
stationary processes 

In this section, we introduce the wavelet setting, define the scalogram and 
explain how spectral change-points can be observed in the wavelet domain. 
The main advantage of using the wavelet domain is to alleviate problems 
arising when the time series exhibit is long range dependent. We will recall 
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some basic results obtained in Moulines et al. (2007) to support our claims. 
We refer the reader to that paper for the proofs of the stated results. 
The wavelet setting. The wavelet setting involves two functions <fi and ip 
and their Fourier transforms 

/oo />oo 
0(t)e-**dt and ^(tje - ** dt, 

-oo J —oo 

and assume the following: 

(W-l) <f> and tp are compactly-supported, integrable, and 0(0) = (f>(t) dt — 

1 and tp 2 (t) dt = 1. 
(W-2) There exists a > 1 such that sup ?eR \$(0\ (1 + |£|) Q < oo. 
(W-3) The function tp has M vanishing moments, i.e. J^° 00 t m ip(t) dt = 

for all m = 0, . . . , M - 1 
(W-4) The function ^2 keZ k m <p(- — k) is a polynomial of degree m for all 

m = 0,...,M-l. 

The fact that both </> and ^ have finite support (Condition (W-l)) ensures that 
the corresponding filters (see (1.7)) have finite impulse responses (see (1.9)). 
While the support of the Fourier transform of tp is the whole real line, Con- 
dition (W-2) ensures that this Fourier transform decreases quickly to zero. 
Condition (W-3) is an important characteristic of wavelets: it ensures that 
they oscillate and that their scalar product with continuous-time polynomials 
up to degree M — 1 vanishes. Daubechies wavelets and Coifiets having at least 
two vanishing moments satisfy these conditions. 

Viewing the wavelet tp(t) as a basic template, define the family {ipj.k,j € 
Z, k G Z} of translated and dilated functions 

V^(i) = 2-->/ 2 $(2-n -k), j e Z, k e Z . (1.2) 

Positive values of k translate ip to the right, negative values to the left. The 
scale index j dilates ip so that large values of j correspond to coarse scales 
and hence to low frequencies. 

Assumptions (W-l)- (W-4) are standard in the context of a multiresolution 
analysis (MRA) in which case, cj> is the scaling function and tp is the associated 
wavelet, see for instance Mallat (1998); Cohen (2003). Daubechies wavelets 
and Coifiets are examples of orthogonal wavelets constructed using an MRA. 
In this paper, we do not assume the wavelets to be orthonormal nor that 
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they are associated to a multiresolution analysis. We may therefore work with 
other convenient choices for <f> and ip as long as (W-l)-(W-4) are satisfied. 
Discrete Wavelet Transform (DWT) in discrete time. We now describe 
how the wavelet coefficients are defined in discrete time, that is for a real- 
valued sequence {xk, k £ Z} and for a finite sample {x k , k — 1, . . . , n}. Using 
the scaling function 0, we first interpolate these discrete values to construct 
the following continuous-time functions 

n 

*n{t) d =^x k (t){t-k) and x(i) d = Y, x k<P(t-k), t£R. (1.3) 
fe=i fcez 

Without loss of generality we may suppose that the support of the scaling 
function <fi is included in [— T, 0] for some integer T > 1. Then 

x„(i) = x(i) for all t £ [0, n - T + 1] . 

We may also suppose that the support of the wavelet function tp is included in 
[0, T]. With these conventions, the support of ipj, k is included in the interval 
[2 J fc, 2- 7 (/c + T)]. Let tq be an arbitrary shift order. The wavelet coefficient W* k 
at scale j > and location k £ Z is formally defined as the scalar product in 
L 2 (M) of the function t i->- x(t) and the wavelet 1 1-> i>j,k(t) m - 

/GO />00 
x(i)V j , k(t) dt = / Xn (t)^ k (t) dt, j > 0, k £ Z , (1.4) 
-OO J — OO 

when [2^fc, 2-J'fc + T)] C [0, n - T + 1], that is, for all (j, k) £ I„, where I„ = 
{(j, k) : j > 0, < k < rij} with nj = 2^'(n - T + 1) - T + 1. It is important 
to observe that the definition of the wavelet coefficient Wj : k at a given index 
(j, k) does not depend on the sample size n (this is in sharp contrast with 
Fourier coefficients). For ease of presentation, we will use the convention that 
at each scale j, the first available wavelet coefficient Wj t k is indexed by k — 0, 
that is, 

Hnf 

Zn = {{j,k) : j > 0,1 < k < rij} with nj = 2" J (n-T+l)-T+l . (1.5) 

Practical implementation. In practice the DWT of {xk, k = 1, . . . , n} is 

not computed using (1.4) but by linear filtering and decimation. Indeed the 
wavelet coefficient W* k can be expressed as 

W tk = J2 Xl h iVk-u U, k) £ I„; , (1.6) 

lei 



1 Testing for homogeneity of variance in the wavelet domain. 7 

where 

/oo 
<t>{t + l)i){2-H)dt . (1.7) 
-co 

For all j > 0, the discrete Fourier transform of the transfer function {hj : i}i e z 



is 



/CO 
J2<t>(t + l>~ iM W j t)dt. (1.8) 

Since and ip have compact support, the sum in (1.8) has only a finite number 
of non- vanishing terms and, Hj(X) is the transfer function of a finite impulse 
response filter, 

H j (X)= J2 h ^~' lM - (1-9) 

2 = -T(2J' + l) + l 

When (j) and ■0 are the scaling and the wavelet functions associated to a MRA, 
the wavelet coefficients may be obtained recursively by applying a finite order 
filter and downsampling by an order 2. This recursive procedure is referred to 
as the pyramidal algorithm, see for instance Mallat (1998). 
The wavelet spectrum and the scalogram. Let X = {X tl t <G Z} be a 
real- valued process with wavelet coefficients {Wj,k, k e Z} and define 

a\ k = VaxiWj, fc ) . 

If A M X is stationary, by Eq (16) in Moulines et al. (2007), we have that, for 
all j, the process of its wavelet coefficients at scale j, {Wj.k, k e Z}, is also 
stationary. Then, the wavelet variance a^ k does not depend on fc, cr? fe = a?. 
The sequence (a? )j>o is called the wavelet spectrum of the process X. 

If moreover A M X is centered, the wavelet spectrum can be estimated by 
using the scalogram, defined as the empirical mean of the squared wavelet 
coefficients computed from the sample X\,.. X n : 



7lj 



3 k=i 



By (Moulines et al., 2007, Proposition 1), if K < M, then the scalogram of X 
can be expressed using the generalized spectral density / appearing in (1.1) 
and the filters Hj defining the DWT in (1.8) as follows: 

o}= f |^(A)| 2 /(A)dA, j>0. (1.10) 
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1.3 Asymptotic distribution of the W2-CUSUM 
statistics 

1.3.1 The single-scale case 

To start with simple presentation and statement of results, we first focus in this 
section on a test procedure aimed at detecting a change in the variance of the 
wavelet coefficients at a single scale j. Let X\, . . . , X n be the n observations 
of a time series, and denote by Wj y k for (j, k) £ T n with T n defined in (1.5) 
the associated wavelet coefficients. In view of (1.10), if X\, . . . ,X n are a n 
successive observations of a K-th order difference stationary process, then the 
wavelet variance at each given scale j should be constant. If the process X is 
not K-th order stationary, then it can be expected that the wavelet variance 
will change either gradually or abruptly (if there is a shock in the original 
time-series). This thus suggests to investigate the consistency of the variance 
of the wavelet coefficients. 

There are many works aimed at detecting the change point in the variance 
of a sequence of independent random variables; such problem has also been 
considered, but much less frequently, for sequences of dependent variables. 
Here, under the null assumption of K-th order difference stationarity, the 
wavelet coefficients {Wj t k, k <G Z} is a covariance stationary sequence whose 
spectral density is given by (see (Moulines et al., 2007, Corollary 1)) 



D ji0 (A;/) d = /(2- j (A + 2/7r))2^' \Hj(2~ j (X + 2ln))\ 2 . (1.11) 



We will adapt the approach developed in Inclan and Tiao (1994), which uses 
cumulative sum (CUSUM) of squares to detect change points in the variance. 

In order to define the test statistic, we first introduce a change point 
estimator for the mean of the square of the wavelet coefficients at each scale 
3- 



Using this change point estimator, the W2-CUSUM statistics is defined as 



2 3 -l 




(1.12) 




(1.13) 
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where is a suitable estimator of the variance of the sample mean of the 
W^. Because wavelet coefficients at a given scale are correlated, we use the 
Bartlett estimator of the variance, which is defined by 

=%(0) + 2 E «>l(?K-))7i(0. (1-14) 
l<l<q(nj) 

where 

^ l ) = h E (i-is) 

j l<i<n 3 -l 

are the sample autocovariance of {Wj it i = 1, . . . , rij}, a? is the scalogram 
and, for a given integer q, 

Wl ( q ) = l- T L-,lG{0,...,q} (1.16) 

are the so-called Bartlett weights. 

The test differs from statistics proposed in Inclan and Tiao (1994) only 
in its denominator, which is the square root of a consistent estimator of the 
partial sum's variance. If {X n } is short-range dependent, the variance of the 
partial sum of the scalograms is not simply the sum of the variances of the 
individual square wavelet coefficient, but also includes the autocovariances 
of these termes. Therefore, the estimator of the averaged scalogram variance 
involves not only sums of squared deviations of the scalogram coefficients, but 
also its weighted autocovariances up to lag q(jij). The weights {wi(q(rij))} 
are those suggested by Newey and West (1987) and always yield a positive 
sequence of autocovariance, and a positive estimator of the (unnormalizcd) 
wavelet spectrum at scale j, at frequency zero using a Bartlett window. We 
will first established the consistency of the estimator s| n . of the variance of 
the scalogram at scale j and the convergence of the empirical process of the 
square wavelet coefficients to the Brownian motion. Denote by Z?([0,1]) is 
the Skorokhod space of functions which are right continuous at each point of 
[0, 1) with left limit of (0, 1] (or cadlag functions). This space is, in the sequel, 
equipped with the classical Skorokhod metric. 

Theorem 1. Suppose that X is a Gaussian process with generalized spectral 
density f. Let ((j>,ip) be a scaling and a wavelet function satisfying (W-l)-(W- 
4). Let {q(nj)} be a non decreasing sequence of integers satisfying 
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q(rij) — > oo and q(nj)/rij — > as rij — > oo. (1-17) 

Assume that A M X is non- deterministic and centered, and that X 2M /(A) is 
two times differentiable in A wii/i bounded second order derivative. Then for 
any fixed scale j , as n — > oo, 

4^; f |D ji0 (A;/)| 2 rfA, (1.18) 

where Dj ; o(A;/) is f/ie wavelet coefficients spectral density at scale j see 
(1.11). Moreover, defining a 2 by (1.10), 

[«jt] 

w/iere (B(t),t G [0,1]) is i/ie standard Brownian motion. 

Remark 1. The fact that X is Gaussian can be replaced by the more general 
assumption that the process X is linear in the strong sense, under appropriate 
moment conditions on the innovation. The proofs are then more involved, es- 
pecially to establish the invariance principle which is pivotal in our derivation. 

Remark 2. By allowing q(rij) to increase but at a slower rate than the number 
of observations, the estimator of the averaged scalogram variance adjusts ap- 
propriately for general forms of short-range dependence among the scalogram 
coefficients. Of course, although the condition (1.17) ensure the consistency 
of s 2 n ., they provide little guidance in selecting a truncation lag q(nj). When 
q(rij) becomes large relative to the sample size rij, the finite-sample distri- 
bution of the test statistic might be far from its asymptotic limit. However 
q{n,j) cannot be chosen too small since the autocovariances beyond lag q(n,j) 
may be significant and should be included in the weighted sum. Therefore, 
the truncation lag must be chosen ideally using some data-driven procedures. 
Andrews (1991) and Newey and West (1994) provide a data-dependent rule 
for choosing q(n,j). These contributions suggest that selection of bandwidth 
according to an asymptotically optimal procedure tends to lead to more ac- 
curately sized test statistics than do traditional procedure The methods sug- 
gested by Andrews (1991) for selecting the bandwidth optimally is a plug-in 
approach. This procedure require the researcher to fit an ARMA model of 
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given order to provide a rough estimator of the spectral density and of its 
derivatives at zero frequencies (although misspecification of the order affects 
only optimality but not consistency) . The minimax optimality of this method 
is based on an asymptotic mean-squared error criterion and its behavior in the 
finite sample case is not precisely known. The procedure outlined in Newey 
and West (1994) suggests to bypass the modeling step, by using instead a 
pilot truncated kernel estimates of the spectral density and its derivative. 
We use these data driven procedures in the Monte Carlo experiments (these 
procedures have been implemented in the R-package sandwich. 

Proof. Since X is Gaussian and A M JT is centered, Eq. (17) in Moulines et al. 
(2007) implies that {Wj^, k e Z} is a centered Gaussian process, whose 
distribution is determined by 

jj(h) = Cov(W jfi , W jth ) = f V Dj, (A; f) e - lXh d\ . 

J — IX 

From Corollary 1 and equation (16) in Moulines et al. (2007), we have 
D„o(A;/) 

2 3 -l 

= f ( 2 " J ( A + 2ln )) 2 ~ j Hj{2- j {\ + 2Ztt)) 

1=0 

where Hj is a trigonometric polynomial. Using that 



s 2M 



|!_ e -^|2M = |£|2M 



l-e-« 2M 



and that \£\ 2M f{£,) has a bounded second order derivative, we get that 
Dj (A; /) has also a bounded second order derivative. In particular, 



/" 



|D Ji0 (A;/)| 2 rfA < oo and ^ | 7j (s)| < oo . (1.20) 



The proof may be decomposed into 3 steps. We first prove the consistency 
of the Bartlett estimator of the variance of the squares of wavelet coefficients 
Sj n . , that is (1.18). Then we determine the asymptotic normality of the finite- 
dimensional distributions of the empirical scalogram, suitably centered and 
normalized. Finally a tightness criterion is proved, to establish the convergence 
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in the Skorokhod space. Combining these three steps completes the proof of 
(1.19). 

Step 1. Observe that, by the Gaussian property, Cov(W J ? , Wj h ) = 2 7 |(/i). 
Using Theorem 3-i in Giraitis et al. (2003), the limit (1.18) follows from 

+oo . „„- 

2 E^W=-/ |D,, (A;/)| 2 rfA < oo, (1.21) 

h=-c* n J -* 

and 

sup \K(h,r,s)\ <oo. (1.22) 



T.S — — OO 

where 

K{h,r,s) - Cum(Wl k ,Wl k+h ,Wl k+r ,Wl k+s ). (1.23) 

Equation (1.21) follows from Parseval's equality and (1.20). Let us now prove 
(1.22). Using that the wavelet coefficients are Gaussian, we obtain 

K,(h,r,s) = 12{ 7j (/i) 7j (r - s) 7j (/i - r) 7j (s) 

+ 7j (hhj (r - s)j 3 (h - s) 7j (r) + 7j (s - h)j 3 (r - /i) 7j (r) 7j (s) } . 

The bound of the last term is given by 

sup J2 Itj ( s - h hj ( r - h hj ( r hi ( s ) I < su p Y It* ( r )7j ( r - *0 1 

h6Z r, s =-co h \r=-oo / 

which is finite by the Cauchy-Schwarz inequality, since lj( r ) < °°- 
Using | 7j (/i)| < 7 j(0) and the Cauchy-Schwarz inequality, we have 

su p Y hj( h hj( r - s h 3 ( h - r hA s )\ < 7j(o) Y 7j( u ) E Itj(«)I, 

' ieZ r,s=-oo «GZ sGZ 

and the same bound applies to 

+oo 

su p Y hj( h hj( r - s hj( h - s hj( r )\- 

Hence, we have (1.22) by (1.20), which achieves the proof of Step 1. 
Step 2. Let us define 

L n j t J 

^w = 4=E(^-^ ( L24 ) 
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where cr| = E(W? i ), and [x\ is the entire part of x. Step 2 consists in proving 
that for < t\ < . . . < t k < 1, and fii, . . . € M, 

Y^^S nj {U) ^ M {q,^ |D,- (A;/)| 2 dAx Var . 

(1.25) 

Observe that 

k k rij 



i=l V n J i= i i =1 



i=i \i=i / 

where we set a Ln = -j= /M{K|n.;td}> £™j = ( w ji> ■ ■ ■ > w m) T and ^ 

V J i=l 

is the diagonal matrix with diagonal entries (ai )Ty , . . . , a nj>nj ). Applying 
(Moulines ct al., 2008, Lemma 12), (1.25) is obtained by proving that, as 
Uj — > oo, 

p{A nj )p{r n] )^0 (1.26) 

Var ^>S n ,(*i)^ -+^f |D„o(A;/)| 2 dA x Var ^ rt (B(*i))) , 

(1.27) 

where p{A) denote the spectral radius of the matrix A, that is, the max- 
imum modulus of its eigenvalues and r nj is the covariance matrix of . 
The process {Wj,i){i=\ t ..., ni } lii stationary with spectral density D Ji0 (.;/). 
Thus, by Lemma 2 in Moulines et al. (2007) its covariance matrix F nj satisfies 



p(r nj ) < 27rsupD^o(A; /). Furthermore, as nj — > oo, 

< nj 1/2 ^2\m\ -> 0, 



p(A n ) = max — != 



k 

-1/2' 



i=l 



i=l 

and (1.26) holds. We now prove (1-27). Using that B(t) has variance t and 
independent and stationary increments, and that these properties characterize 
its covariance function, it is sufficient to show that, for all t £ [0, 1], as nj — > oo, 

Var (S nj (t)) -> t ^ |D Ji0 (A;/)| 2 rfA , (1.28) 
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and for all < r < s < t < 1, as rij — > oo, 

Cov (S ntj (t) - S nJ (a), S nJ (r)) -> . (1.29) 
For any sets A,BC [0, 1], we set 

V nj (T,A,B) = — ^2l A ((k + T)/ nj )l B (k/ nj ) . 
n i k>i 

For all < s, t < 1, we have 

Cov (S nj (i),S n ») =-E E Cov(^,^ fe ) 
J i=l fe=l 

= 2E^(r)K 3 .(r ) ]0 ) «] ) ]0, a ]). 

The previous display applies to the left-hand side of (1.28) when s = t and 
for < r < s < t < 1, it yields 

Cov (S nj (t)-S nj (s),S nj (r)) =2^ 7 |(T)l4 I (r,] Sl t],]0,r]) . 
Observe that for all i,BC [0, 1], sup \V n (j, t,A,B\<£-< 1. Hence, by dom- 

T 

inated convergence, the limits in (1.28) and (1.29) are obtained by computing 
the limits of V n (j, r, ]0, t], ]0, t}) and Ki(j, r, ]s, t], ]0, r]) respectively. We have 
for any r e Z, f > 0, and large enough, 

E 1 {^G]o,t]} 1 {^-e]o,t]} = {(V A V - T )}+ = n i l - T + ■ 
fe>i " J " J 

Hence, as nj -> oo, V„ 3 (t, ]0,t], ]0, t]) ->• t and, by (1.21), (1.28) follows. We 
have for any r £ Z and < r < s < t, 

E 1 {^G] s ,t]} 1 {^-e]o,r]} = {K^Ajn^-r}) - (0 V {n jS - r})} + 

fc>i ™ J " 3 

= (rij-r - rij-s + r)+ l {r=s} r+ , 

where the last equality holds for rij large enough and the limit as rij — » oo. 
Hence V^. (r, ]s, £], ]0, r]) — >■ and (1.29) follows, which achieves Step 2. 
Step 3. We now prove the tightness of {S nj (t), t € [0, 1]} in the Skorokhod 
metric space. By Theorem 13.5 in Billingsley (1999), it is sufficient to prove 
that for all < r < s < t, 
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mS n] (s)-S n] (r)\ 2 \S n] (t)-S n] ( S )\ 2 } <C\t~r\ 2 , 

where C > is some constant independent of r, s, t and nj. We shall prove 
that, for all < r < t, 

E[\S nj (t) - S nj (r)\ 4 } < CifnT^K-tJ - KrJ)} 2 . (1.30) 

By the Cauchy-Schwarz inequality, and using that, for < r < s < t, 

nj\[njt\ - [n jS \) x nj\[n lS \ - [ nj r\) < 4(t - rf , 

the criterion (1.30) implies the previous criterion. Hence the tightness follows 
from (1.30), that we now prove. We have, for any i = . . . ,14), 



E 



,fc=i 



Cum(W| ii , . . . , WlJ+EiWf^ , W? tia )E[W? ti3 ,W? iU ] 



It follows that, denoting for < r < t < 1, 



E 



S„ 3 (t)-S„»| 4 j = 4, ^ Cum(w£ ii; M ) 



+ ^ ( E E [^^ 2 ] 



where A rji = + 1, . . . , |_ n j*J}- Observe that 

< ^ E E K-n^J < 2 E^W x "j'UMJ - M) . 

Using that, by (1.23), Cum(W| n , . . ., Wf M ) = JC(i 2 - 21,13 - «i,«4 - we 
have 

[njt] — [njr] —1 

E I Cnm ( W L , • • • ^u) I < ( L«j*J - [njr\ ) E |£(M,0l 

+00 

< 2(LnjtJ - [n 3 r\f sup V |/C(^,r, s)| . 

The last three displays and (1.22) imply (1.30), which proves the tightness. 

Finally, observing that the variance (1.21) is positive, unless / vanishes 
almost everywhere, the convergence (1.19) follows from Slutsky's lemma and 
the three previous steps. 
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1.3.2 The multiple-scale case 

The results above can be extended to test simultaneously changes in wavelet 
variances occurring simultaneously at multiple time-scales. To construct a 
multiple scale test, consider the between- scale process 

{[WX k ,Wf ik (j-j') T ] T hez, (1-31) 

where the superscript T denotes the transpose and W* k (u), u = 0,1,..., j, 
is defined as follows: 

W£ fc (u) = \Wf_ u ^ k , W* Ut2uk+1 , . . ., W^ Ui2 « fc+2 u_ 1 ] T . (1.32) 

It is a 2"-dimcnsional vector of wavelet coefficients at scale j' = j — u and 
involves all possible translations of the position index 2 u k by v = 0, 1, . . . , 2 U — 
1. The index u in (1.32) denotes the scale difference j — j' > between the 
finest scale j' and the coarsest scale j. Observe that Wj f fe (0) (u = 0) is the 
scalar W* k . It is shown in (Moulines et al., 2007, Corollary 1) that, when 
A M X is covariance stationary, the between scale process {[W^., W^ fe (j — 
j') T ] T }kez is also covariance stationary. Moreover, for all < u < j, the 
between scale covariance matrix is defined as 

Cov (Hf , Wf ik (u)) = r c iXk Dj iU (A; /) dA , (1.33) 

J — 7T 

where Dj jU (A;/) is the cross-spectral density function of the between-scale 
process given by (see (Moulines ct al., 2007, Corollary 1)) 

2 3 -l 

D jlU (A;/) d = eu{\ + 2lir)f(2-i(\ + 2l7r))2-3/ 2 H 1 (2-i(\ + 2lir)) 

1=0 

x 2-0-«)/ 2 ^_ ti (2-J(A + 2?7r)) , (1.34) 

where for all £ € M, 

e u (0 =' 2-/ 2 [1, c- i2 ^«, . . . , e" i (2"-D2-e ] T 

The case u — corresponds to the spectral density of the within-scale process 
{Wj yk }kei, given in (1.11). Under the null hypothesis that X is K-th order 
stationary, a multiple scale procedure aims at testing that the scalogram in a 
range satisfies 
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Ho : o) A = ■■■= a? n . , for all j G { J u Jy + 1, . . . , J 2 } (1.35) 

where J\ and J2 are the finest and the coarsest scales included in the proce- 
dure, respectively. The wavelet coefficients at different scales are not uncorre- 
cted so that both the within- scale and the between scale covariances need to 
be taken into account. 



w 



J 2 -l,2fc 



• • • • 



2k + l\ 



2 E\w? 1:2 u k+l ] 

i=0 



( U — J 2 — Jl ) 



Yj 1 ^j 2: k is a vector-valued stationary process 



Fig. 1.1. Between scale stationary process. 



As before, we use a CUSUM statistic in the wavelet domain. However, we 
now use multiple scale vector statistics. Consider the following process 

T 



2 2^ 3— lJ 



7 i'(i-l)+u 



The Bartlett estimator of the covariance matrix of the square wavelet's coef- 
ficients for scales { Ji, . . . , J2} is the ( J2 — Ji + 1) x (J2 — Jx + 1) symmetric 
definite positive matrix given by : 
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fji,J 2 = E w r[q{nj 2 )]l.]^J 2 {T) , where (1.36) 

T=-q(nj 2 ) 

-. n ->2 

1JuJ*(t) = — E - (YjuM+r ~ *JuJ*) T ■ (1-37) 

UJ * i,i+T=l 

nj 2 

where Y Ju j 2 = ^- E ^1^2 ,« 
2 »=i 

Finally, let us define the vector of partial sum from scale J\ to scale J2 as 

(1.38) 

j=Ji,...,j 2 

Theorem 2. Under the assumptions of Theorem 1, we have, as n — > 00, 

^,72 = r JllJa + Op f ^^-) + Op(g- 1 (n, 72 )), (1.39) 

where r Ju j 2 (j,j') = J2hez Cov ( Y j,0, Y j',h), with 1 < j,f < J 2 - Ji + 1 and, 

S(Sj 1 ,J I (t)-Efo,Jii i »5(t) = (^(t),---,^^*)), (1-40) 

in Z> /2 ~ ,/l+1 [0, 1], where {Bj{t)}j=j lt ... t j 2 are independent Brownian motions. 

The proof of this result follows the same line as the proof of Theorem 1 
and is therefore omitted. 

1.4 Test statistics 

Under the assumption of Theorem 1, the statistics 

Tj u j 2 {t) (S JuJ2 (t) - tSj uJ2 (l)f fj^ (S JuJ2 (t) - tS JliJ2 (l)) (1.41) 
converges in weakly in the Skorokhod space D([0, 1]) 

T. h ,j 2 (t)^ [B°M 2 (1-42) 

where t i->- (B®(t), . . . , Bj J+1 (t)) is a vector of J2 — Ji + 1 independent 
Brownian bridges 



lnj 2 



y nj t\ 
E^; 



1 Testing for homogeneity of variance in the wavelet domain. 19 



Nominal S. 


d= 1 


d = 2 


d = 3 


d = 4 


d = 5 


d = 6 


0.95 


0.4605 


0.7488 


1.0014 


1.2397 


1.4691 


1.6848 


0.99 


0.7401 


1.0721 


1.3521 


1.6267 


1.8667 


2.1259 



Table 1.1. Quantiles of the distribution C(d) (see (1.44)) for different values of d 

For any continuous function F : D[0,1] — > R, the continuous mapping 
Theorem implies that 

J2-J1-1 

We may for example apply either integral or max functional, or weighted 
versions of these. A classical example of integral function is the so-called 
Cramer- Von Mises functional given by 

CVM(Ji, J 2 ) = / T Ju j 2 (t)dt , (1.43) 
Jo 

which converges to C(J 2 — J\ + 1) where for any integer d, 

C(d) d ^ f^mfdt. (1.44) 

The test rejects the null hypothesis when CVM j 1i j 2 > c(J 2 — Ji + 1, a), where 
c(d, a) is the 1 — ath quantile of the distribution of C{d). The distribution of 
the random variable C(d) has been derived by Kiefer (1959) (see also Carmona 
et al. (1999) for more recent references). It holds that, for x > 0, 

P ( C (d) <x)- 2<d+1)/2 V ^ + rf /V ^/4^ CTl ( V + d/2 \ 

where P denotes the gamma function and Cyl are the parabolic cylinder 
functions. The quantile of this distribution are given in table 1.1 for different 
values of d = J 2 — J\ + 1. It is also possible to use the max. functional leading 
to an analogue of the Kolmogorov-Smirnov statistics, 

KSM(J 1 ,J 2 ) d = sup T Jlt j 2 {t) (1.45) 

0<t<l 

which converges to D(J 2 — J\ + 1) where for any integer d, 



F[Tj u jM-^F 
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d 


1 


2 


3 


4 


5 


6 


0.95 


1.358 


1.58379 


1.7472 


1.88226 


2.00 


2.10597 


0.99 


1.627624 


1.842726 


2.001 


2.132572 


2.24798 


2.35209 



Table 1.2. Quantiles of the distribution D(d) (see (1.46)) for different values of d. 



d 

D(d) d = f sup Y.[ B °M 2 ■ (!-46) 
o<t<i^ 

The test reject the null hypothesis when KSM Jl _j 2 > S(J 2 — J\ + 1, a), where 
S(d,a) is the (1 — a)-quantile of D(d). The distribution of D{d) has again 
be derived by Kicfcr (1959) (see also Pitman and Yor (1999) for more recent 
references). It holds that, for x > 0, 

2 i+(2-d)/2 °° f» ( f \ 

where < j„,i < jv,2 < ■■■ is the sequence of positive zeros of J v , the Bessel 
function of index v = (d — 2)/2. The quantiles of this distribution are given 
in Table 1.2. 



1.5 Power of the W2-CUSUM statistics 
1.5.1 Power of the test in single scale case 

In this section we investigate the power of the test. A minimal requirement 
is to establish that the test procedure is pointwise consistent in a presence of 
a breakpoint, i.e. that under a fixed alternative, the probability of detection 
converges to one as the sample size goes to infinity. We must therefore first 
define such alternative. For simplicity, we will consider an alternative where 
the process exhibit a single breakpoint, though it is likely that the test does 
have power against more general class of alternatives. 

The alternative that we consider in this section is defined as follows. Let 
fi and / 2 be two given generalized spectral densities and suppose that, at a 
given scale j, |£fj(A)| 2 /i(A)dA < oo, i = 1, 2, and 

f | J ff J (A)| 2 (/ 1 (A)-/ 2 (A))dA^0. (1.47) 
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Define by {X Li ) le i, i = 1,2, be two Gaussian processes, defined on the same 
probability space, with generalized spectral density f\. We do not specify the 
dependence structure between these two processes, which can be arbitrary. 
Let k e]0, 1[ be a breakpoint. We consider a sequence of Gaussian processes 
(X£) feeZ , such that 

X { k n) = X k .i for k < [nn\ and = X k . 2 for k > [nn\ + 1 . (1.48) 

Theorem 3. Consider {XJ}}k^z be a sequence of processes specified by (1-47) 
and (1.48). Assume that q(nj) is non decreasing and : 

q(rij) 



q{rij 



oo and 



as m 



oo . 



(1.49) 



Then the statistic T nj defined by (1.13) satisfies 

y^L- jn{i-K){i + 0p (i)) < T nj -A oo 



(1.50) 



Proof. Let kj — \ njn\ the change point in the wavelet spectrum at scale j. We 
write q for q(rij) and suppress the dependence in n in this proof to alleviate the 

notation. By definition T nj = j— sup (S nj (t) — tS nj (1)) , where the process 

3, " 3 o<t<i 

1 1 y S nj (t) is defined in (1.24). Therefore, T„ . > j±- (S nj (k) - kS Uj (1)) . The 



proof consists in establishing that — — (S„Ak) — KS n Al)) = j 3 \/ n(l — «)(!+ 
o p (l)). We first decompose this difference as follows 



S n An)-KS n m 



where B n . is a fluctuation term 



i=l i=l 



B nj 



1 



(1.51) 



and / nj is a bias term 



i=i 



(1.52) 
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Since support of is included in [— T(2 3 + 1), 0] where hjj is defined in (1.7), 
there exits a constant a > such that 











for i < kj, 


(1.53) 






i<k 








w jti 


= w jA2 = 


E h 3&i~ 
l>k 




for i > kj + a, 


(1.54) 


w jti 


= E h i& 


i-lXu for 


kj < 


i < kj + a. 


(1.55) 



Since the process {Xi v i}i e z and {Xi^}iez are both K-th order covariance 
stationary, the two processes {Wj^ ; i} i<£ z and {Wj^jiei, are also covariance 
stationary. The wavelet coefficients Wj^ for i e {fcj, . . . , kj +a} are computed 
using observations from the two processes X\ and X 2 . Let us show that there 
exits a constant C > such that, for all integers I and r, 

Var ^E W?^ < Ct . (1.56) 

Using (1.21), we have, for e = 1,2, 

Var (X U M ^ J |D„o ;e (A)| 2 dA 

where, Dj j o ; i(A) and Dj ! o ; 2(A) denote the spectral density of the stationary 
processes {Wj^ ; i} ie z and {Wj^.^i^i respectively. Using Minkovski inequality, 

/ l+T \ 

we have for I < kj < kj + a < I + r that ( Var Y^iJi ) i s at most 

k 3 \ ^ 2 fc,+„ / ;+T \ V2 

V«£wft + E (VarW^ 1/2 + Var £ 

i—l J i—kj-\-l \ i— 

/ fej \ 1/2 / i+r X ! 2 

< Var E W&i + a sup(VarT^ i ) 1 /2 + V ar ]T ^;2 

y i=Z J 1 \ i=kj+a+l 

Observe that Var(Vt^) < 2(£), |/ij,;|) 2 (er 2 i V er 2 2 ) 2 < oo for kj < i < kj + a, 
where 

a\ x = E , and a 2 2 - E [W* ;2 ] (1.57) 

The three last displays imply (1.56) and thus that B nj is bounded in proba- 
bility. Moreover, since /„, reads 
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i=l 



i= [n_j re J + 1 



i— \_rij re J +a+l 



= V^jKI 1 - K ) \<Tj;l - Vj;2 | + °{ n j 1/2 ) . 



we get 



S nj (k) - k5„ 3 (1) - 0»Jk(1 - k) - cr? 2 ) + P {\) 
We now study the denominator s 2 n . in (1.13). Denote by 



(1.58) 



, n 3 

a 2 — — - a 

3 n • -^-^ ■ 



2 



the expectation of the scalogram (which now differs from the wavelet spec- 
trum). Let us consider for t e {0, . . . ,q(rij)} %(r) the empirical covariance 
of the wavelet coefficients defined in (1.15). 



% w = ^ E - *?x^ - -i) - (i + -) (-1 - 



j i—i 



i—rij — r+1 



Using Minkowski inequality and (1.56), there exists a constant C such that 
for all 1 < I < I + t < n 



3 ' 



l+T 



E - < 



< 



E W - 4 



i=Z 



<C(t 1/2 +t), 



Z+r 



EK<-*?) 



and similarly 



1^2 -211 ^ 



C 



By combining these two latter bounds, the Cauchy-Schwarz inequality implies 
that 

C(rV2 + T ) 

■ \ , ' O ! :> { II ; ; - O ; f 



i=l 



3/2 



Recall that s 2 n . = Y^r=-q w r(l)lj( T ) where w T (q) are the so-called Bartlett 
weights defined in (1.16). We now use the bounds above to identify the limit 
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of s| as the sample size goes to infinity. The two previous identities imply 
that 

f>(«) (1 + ^) h-^mc-C* 

t=o v n ^ 

and 



T = 

Therefore, we obtain 



< C- 



3/2 ' 



S lnj = E Wr(qhj(T)+Op 



n 



3/2 



(1.59) 



where (t) is defined by 



7>w = - E(^-^)(^-^)- 



J i—l 



(1.60) 



Observe that since q — o(n,j), kj — [rijK\ and < r < q, then for any given 
integer a and n large enough 0<T<kj<kj+a<rij~T thus in (1.60) we 

may write Y^LT = Y!lLT + Ey rT+ i + EZkJ+a+v Usin S and °j;2 
in (1.60) and straightforward bounds that essentially follow from (1.56), we 
get s|, n . = sl n . + P , where 



s "k = E ^r(g)(^7 3 -;i(r) + ^ ^ % ;2 (t) 



+ ^- M (4i-^) 2 + 



71 j A/J ft 



(fj;2 - G T, 



with 



fc 3 -r 



i=l 



7j;2(r) = 



77/ 7 /t -j tl 



E (^-4 2 )(^ i+r -4 2 ) 



i=fcj +a+l 



Using that crj — >■ Avcr? ;1 + (1 — K)<Tj. 2 as nj oo, and that, for e = 1,2, 
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P 1 



/. 



•71 




dcf 



IWA)| 2 dA, 



7T 



— 7T 



9 



we obtain 




Using (1.58), the last display and that o p (l) + Op = °p{l)^ we finally 

obtain 




which concludes the proof of Theorem 3. 

1.5.2 Power of the test in multiple scales case 

The results obtained in the previous Section in the single scale case easily 
extend to the test procedure designed to handle the multiple scales case. The 
alternative is specified exactly in the same way than in the single scale case 
but instead of considering the square of the wavelet coefficients at a given 
scale, we now study the behavior of the between-scale process. Consider the 
following process for e = 1, 2, 



where Ji and J 2 are respectively the finest and the coarsest scale considered 
in the test, Wj^ ;e are defined in (1.53) and (1.54) and rj lt j 2;e the (J2 — J\ + 
1) x (J2 — J\ + 1) symmetric non negative matrix such that 



/; /: /i ^Cov; / ||Dj jU;e (A; ,/)|| 2 dX, (1.62) 





with 1 < j, f < J 2 - Ji + 1 for e = 1, 2. 
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Theorem 4. Consider {X£} keZ be a sequence of processes specified by (1.47) 
and (1.48). Finally assume that for at least one j G {J 1; ..., J 2 } and that 
at least one of the two matrices r,j 1 ^j 2 - e e = 1,2 defined in (1.62) is positive 
definite. Assume in addition that Finally, assume that the number of lags 
q(n.j 2 ) in the Barlett estimate of the covariance matrix (1.36) is non decreasing 
and: 

c t{ n j2) ~~ 00 an d ~ as ~ ^ 00 1 • (1.63) 

Then, the W2-CUSUM test statistics T Jl ^ 2 defined by (HI) satisfies 

U / 2 k(1 - k) (1 + o p (l)) < Tj^jj oo as nj 2 -> oo 
^("J 2 J 

Proof. As in the single scale case we drop the dependence in nj 2 in the 
expression of q in this proof section. Let kj = [njK\ the change point in 
the wavelet spectrum at scale j. Then using (1.38), we have that Tj 1 ^j 2 > 
s Ji,J2( K ) - kSji.J* (1) where 

SjiM*) - kS Ju j 2 {1) = —1= [rij{B nj + /„.)] J2 , 

where _B nj . and /„ are defined respectively by (1.51) and (1.52). Hence as in 
(1.58), we have 

Sj u j 2 (k) - kSj 1iJ2 (1) = jn-T 2 n{l - k)A + O p (1) , 

where A = [<^ liJail - o-j u j 2 . 2 ] T and 

We now study the asymptotic behavior of Fj 1 ^j 2 . Using similar arguments as 
those leading to (1.61) in the proof of Theorem 3, we have 

f JuJ2 = 2 qK (i - k)aa t + nr JuJ2 . tl + (1 - K,)r JuJ2 . 2 

+ Op (_L) +Op( ,-. ) + 0p (£). 

For r a positive definite matrix, consider the matrix M(r) = T + 2qn(l — 
k)AA t . Using the matrix inversion lemma, the inverse of M.(r) may be ex- 
pressed as 

[ ' \ 1 + 2qn{\ - n)A T r~ 1 A 
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which implies that 

A T r~ 1 A 

A T M~ 1 (r)A = — 

Applying these two last relations to r = kJj^j + (1 — K)r^j which is 
symmetric and definite positive (since, under the stated assumptions at least 
one of the two matrix rj lt j 2;€ , e = 1, 2 is positive) we have 

T Ju , h > k 2 (1 - Kfnj^M- 1 (r + Op (jL-^j + Opiq- 1 )^ A + P (1) 

A T r Q - 1 A + Op(^-)+o P (q- 1 

=^«(i-«)(i +0p (i)) . 
p 

Thus Tj lt j 2 — > oo as n,j 2 — » oo, which completes the proof of Theorem 4. 

Remark 3. The term corresponding to the " bias" term /t7j 1; j 2; i+(l— k).Tj 1; j 2; 2 
in the single case is ± /^{k|D,- 0; i(A)| 2 + (1- k )|D j - 0;2 (A)| 2 }dA = 0(1), which 
can be neglected since the main term in Sj n . is of order q — > oo. In multiple 
scale case, the main term in r,j 1 , j 2 is still of order q but is no longer invertible 
(the rank of the leading term is equal to 1). A closer look is thus necessary and 
the term kFj^^-x + (1 — K)rj 1 _j 2 . 2 has to be taken into account. This is also 
explains why we need the more stringent condition (1.63) on the bandwidth 
size in the multiple scales case. 



1.6 Some examples 

In this section, we report the results of a limited Monte-Carlo experiment to 
assess the finite sample property of the test procedure. Recall that the test 
rejects the null if either CVM(Ji, J 2 ) or KSM(Ji, J 2 ), defined in (1.43) and 
(1.45) exceeds the (1 — a)-th quantile of the distributions C(J 2 — Ji + 1) 
and D(J 2 — J\ + 1), specified in (1.44) and (1.46). The quantiles are re- 
ported in Tables (1.1) and (1.2), and have been obtained by truncating the 
series expansion of the cumulative distribution function. To study the influ- 
ence on the test procedure of the strength of the dependency, we consider 
different classes of Gaussian processes, including white noise, autoregressive 
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moving average (ARMA) processes as well as fractionally integrated ARMA 
(ARFIMA(p, d, q)) processes which are known to be long range dependent. In 
all the simulations we set the lowest scale to J\ = 1 and vary the coarsest scale 
J2 = J. We used a wide range of values of sample size n, of the number of 
scales J and of the parameters of the ARMA and FARIMA processes but, to 
conserve space, we present the results only for n = 512, 1024, 2048, 4096, 8192, 
J = 3, 4, 5 and four different models: an AR(1) process with parameter 0.9, a 
MA(1) process with parameter 0.9, and two ARFIMA(l,d,l) processes with 
memory parameter d = 0.3 and d = 0.4, and the same AR and MA coeffi- 
cients, set to 0.9 and 0.1. In our simulations, we have used the Newey-West 
estimate of the bandwidth q(rij) for the covariance estimator (as implemented 
in the R-package sandwich). 

Asymptotic level of KSM and CVM. 

We investigate the finite-sample behavior of the test statistics CVM(Ji, J 2 ) 
and KSM(Ji, J2) by computing the number of times that the null hypothesis 
is rejected in 1000 independent replications of each of these processes under 
"Ho , when the asymptotic level is set to 0.05. 



White noise 





n 512 


1024 


2048 


4096 


8192 


J = 
J = 


3 KSM 0.02 
3 CVM 0.05 


0.01 0.03 
0.045 0.033 


0.02 
0.02 


0.02 
0.02 


J = 
J = 


4 KSM 0.047 
4 CVM 0.041 


0.04 
0.02 


0.04 
0.016 


0.02 
0.016 


0.02 
0.01 


J = 
J = 


5 KSM 0.09 
5 CVM 0.086 


0.031 0.02 
0.024 0.012 


0.025 
0.012 


0.02 
0.02 



Table 1.3. Empirical level of KSM — CVM for a white noise. 



1 Testing for homogeneity of variance in the wavelet domain. 

MA(1)[0 = 0.9] 

n 512 1024 2048 4096 8192 

J = 3 KSM 0.028 0.012 0.012 0.012 0.02 

J = 3 CVM 0.029 0.02 0.016 0.016 0.01 

J = 4 KSM 0.055 0.032 0.05 0.025 0.02 

J = 4 CVM 0.05 0.05 0.03 0.02 0.02 

J = 5 XSM 0.17 0.068 0.02 0.02 0.02 

J=5CVM 0.13 0.052 0.026 0.021 0.02 

Table 1.4. Empirical level of KSM - CVM for a MA(q) process. 

AR(1)[0 = O.9] 

n 512 1024 2048 4096 8192 

J = 3 KSM 0.083 0.073 0.072 0.051 0.04 

J = 3 CVM 0.05 0.05 0.043 0.032 0.03 

J = 4 KSM 0.26 0.134 0.1 0.082 0.073 

J = 4 CVM 0.14 0.092 0.062 0.04 0.038 

J = 5 TfSM 0.547 0.314 0.254 0.22 0.11 

J = 5 CVM 0.378 0.221 0.162 0.14 0.093 

Table 1.5. Empirical level of KSM - CVM for an AR(1) process. 



ARFIMA(1,0.3,1)[<£ = 0.9, 9 = 0.1] 





n 


512 


1024 2048 


4096 


8192 


J = 
J = 


3 ifSM 0.068 
3 CVM 0.05 


0.047 0.024 
0.038 0.03 


0.021 
0.02 


0.02 
0.02 


J = 
J = 


4 TfSM 
4 CVM 


0.45 
0.39 


0.42 0.31 
0.32 0.20 


0.172 
0.11 


0.098 
0.061 


J = 
J = 


5 TfSM 
5 CVM 


0.57 
0.41 


0.42 0.349 
0.352 0.192 


0.229 
0.16 


0.2 

0.11 



Table 1.6. Empirical level of KSM - CVM for an ARFIMA(1, 0.3, 1) process. 
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ARFIMA(1,O.4,1)[0 = 0.9, 6 = 0.1] 





n 512 


1024 


2048 


4096 


8192 


J = 
J = 


3 KSM 0.11 
3 CVM 0.065 


0.063 0.058 
0.05 0.043 


0.044 
0.028 


0.031 
0.02 


J = 
J = 


4 KSM 0.512 
4 CVM 0.49 


0.322 
0.2 


0.26 
0.192 


0.2 
0.16 


0.18 
0.08 


J = 
J = 


5 KSM 0.7 
5 CVM 0.59 


0.514 
0.29 


0.4 
0.262 


0.321 
0.196 


0.214 
0.121 



Table 1.7. Empirical level of KSM - CVM for an ARFIMA{1, 0.3, 1) process. 



We notice that in general the empirical levels for the CVM are globally 
more accurate than the ones for the KSM test, the difference being more sig- 
nificant when the strength of the dependence is increased, or when the number 
of scales that are tested simultaneously get larger. The tests are slightly too 
conservative in the white noise and the MA case (tables (1.3) and (1.4)); in 
the AR(1) case and in the ARFIMA cases, the test rejects the null much too 
often when the number of scales is large compared to the sample size (the 
difficult problem being in that case to estimate the covariance matrix of the 
test). For J = 4, the number of samples required to meet the target rejection 
rate can be as large as n — 4096 for the CVM test and n = 8192 for the 
KSM test. The situation is even worse in the ARFIMA case (tables (1.6) and 
(1.7)). When the number of scales is equal to 4 or 5, the test rejects the null 
hypothesis much too often. 




Pvalues under HO of ARFIMA(0.9,d,0.1) 



CO 

CD O 
CO 

S. o 

o 
d 

200 400 600 800 1000 

Fig. 1.2. Pvalue under Ho of the distribution D(J) n = 1024 for white noise 
and MA(1) processes and n = 4096 for AR(1) and ARFIMA(l,d,l) processes; the 
coarsest scale is J = 4 for white noise, MA and AR processes and J = 3 for the 
ARFIMA process. The finest scale is J\ = 1. 

Power of KSM and CVM. 

We assess the power of test statistic by computing the test statistics in pres- 
ence of a change in the spectral density. To do so, we consider an observation 
obtained by concatenation of n\ observations from a first process and n.2 
observations from a second process, independent from the first one and hav- 
ing a different spectral density. The length of the resulting observations is 
n = m +TI2. In all cases, we set n\ = 712 = n/2, and we present the results 
for m = 512, 1024, 2048,4096 and scales J = 4, 5. We consider the following- 
situations: the two processes are white Gaussian noise with two different vari- 
ances, two AR processes with different values of the autoregressive coefficient, 
two MA processes with different values of the moving average coefficient and 
two ARFIMA with same moving average and same autoregressive coefficients 





■ d=0.3 

■ d=0.4 
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but different values of the memory parameter d. The scenario considered is a 
bit artificial but is introduced here to assess the ability of the test to detect 
abrupt changes in the spectral content. For 1000 simulations, we report the 
number of times Hi was accepted, leading the following results. 



white-noise \a\ = 1, a\ = 0.7] 





ni = n.2 512 


1024 2048 4096 


J 


= 4 KSM 0.39 


0.78 0.89 


0.95 


J 


= 4 CVM 0.32 


0.79 0.85 


0.9 


J 


= 5 KSM 0.42 


0.79 0.91 


0.97 


J 


= 5 CVM 0.40 


0.78 0.9 


0.9 



Table 1.8. Power of KSM — CVM on two white noise processes. 



MA(1)+MA(1) [0i = 0.9, 02 = 0.5] 





m 


= n 2 


512 


1024 2048 4096 


J 


= 4 


KSM 


0.39 


0.69 0.86 0.91 


J 


= 4 


CVM 


0.31 


0.6 0.76 0.93 


J 


= 5 


KSM 


0.57 


0.74 0.84 0.94 


J 


= 5 


CVM 


0.46 


0.69 0.79 0.96 



Table 1.9. Power of KSM — CVM on a concatenation of two different MA processes. 



AR(1)+AR(1) [0i = 0.9, <t> 2 = 0.5] 





ni 


= n2 


512 


1024 


2048 


4096 


J 


= 4 


KSM 


0.59 


0.72 0.81 


0.87 


J 


= 4 


CVM 


0.53 


0.68 


0.79 


0.9 


J 


= 5 


KSM 


0.75 


0.81 


0.94 


0.92 


J 


= 5 


CVM 


0.7 


0.75 


0.89 


0.91 



Table 1.10. Power of KSM — CVM on a concatenation of two differents AR processes. 

The power of our two statistics gives us satisfying results for the considered 
processes, especially if the sample size tends to infinity. 
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ARFIMA(1,0.3,1) 


+ 


ARFIMA(1,0.4,1) 


[<j> = 0.9, 6 


= 0.1] 


m = ri2 




512 


1024 2048 


4096 


J = 4 




0.86 


0.84 0.8 


0.81 


J = 4 


CVM 


0.81 


0.76 0.78 


0.76 


J = 5 


KSM 


0.94 


0.94 0.9 


0.92 


J = 5 


CVM 


0.93 


0.92 0.96 


0.91 



Table 1.11. Power of KSM - CVM two ARFIMA(l,d,l) with same AR and MA part 
but two different values of memory parameter d. 



03 
CD 

> 

I 



CO 

o 



o 



C\J 

o 




AR(0.9)+AR(0.5) 
MA(0.9)+MA(0.5) 
WN(s1=1,s2=0.7) 
ARFIMA(d=0.3,d=0.4) 







200 



400 



600 



800 



1000 



Fig. 1.3. Empirical power of KSM(d = 4) for white noise, AR, MA and ARFIMA 
processes. 



Estimation of the change point in the original process. 

We know that for each scale j, the number rij of wavelet coefficients is rij = 
2 _J (n — T + f ) — T + f . If we denote by kj the change point in the wavelet 
coefficients at scale j and k the change point in the original signal, then 
k = 2^ (kj + T — 1) + T — 1. In this paragraph, we estimate the change point in 
the generalized spectral density of a process when it exists and give its 95% 
confidence interval. For that, we proceed as before. We consider an observation 
obtained by concatenation of ri\ observations from a first process and 
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observations from a second process, independent from the first one and having 
a different spectral density. The length of the resulting observations is n = ni + 
n 2 - we estimate the change point in the process and we present the result for 
ni = 512,1024,4096,8192, n 2 = 512,2048,8192, J = 3, the statistic CVM, 
two AR processes with different values of the autoregressive coefficient and 
two ARFIMA with same moving average and same autoregressive coefficients 
but different values of the memory parameter d. For 10000 simulations, the 
bootstrap confidence intervals obtained are set in the tables below, we give 
also the empirical mean and the median of the estimated change point. 

• [AR(1), = 0.9] and [AR(l),cj> = 0.5] 



m 


512 


512 


512 


1024 


4096 


8192 


ri2 


512 


2048 


8192 


1024 


4096 


8192 


MEANcvm 


478 


822 


1853 


965 


3945 


8009 


MEDIAN cvm 


517 


692 


1453 


1007 


4039 


8119 


ICcvm 


[283,661] 


[380,1369] 


[523,3534] 


[637,1350] 


[3095,4614] 


[7962,8825] 



Table 1.12. Estimation of the change point and confidence interval at 95% in the 
generalized spectral density of a process which is obtain by concatenation of two 
AR(1) processes. 



• [ARFIMA{1, 0.2, 1)] and [ARFIMA(l, 0.3, 1)], with cf> = 0.9 and 9 = 0.2 



m 


512 


512 


512 


1024 


4096 


8192 


ri2 


512 


2048 


8192 


1024 


4096 


8192 


MEANcvm 


531 


1162 


3172 


1037 


4129 


8037 


MEDIAN cvm 


517 


1115 


3215 


1035 


4155 


8159 


ICcVM 


[227,835] 


[375,1483] 


[817,6300] 


[527,1569] 


[2985,5830] 


[6162,9976] 



Table 1.13. Estimation of the change point and confidence interval at 95% in the 
generalized spectral density of a process which is obtain by concatenation of two 
ARFIMA(l,d,l) processes. 



We remark that the change point belongs always to the considered confidence 
interval excepted for ni — 512, n 2 = 8192 where the confidence interval is 
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[523, 3534] and the change point k = 512 doesn't belong it. One can noticed 
that when the size of the sample increases and n\ = n 2 , the interval becomes 
more accurate. However, as expected, this interval becomes less accurate when 
the change appears either at the beginning or at the end of the observations. 
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